1 Abstract
This study investigates key physiological risk factors associated with the severity of obstructive sleep apnoea (OSA). Using multiple linear regression, we analyze the relationship between the arousal index (ai), a measure of sleep disruption, and several predictors: age, body mass index (BMI), neck size, and systolic blood pressure (SBP). The results reveal that neck size, SBP, and age are statistically significant predictors of OSA severity, while BMI shows minimal contribution after accounting for multicollinearity with neck size. Diagnostic plots confirm model assumptions of linearity and normality, though the model explains only about 22% of the variability in ai. This suggests that other unobserved variables, such as lifestyle or genetic factors, may also influence OSA severity.
2 Introduction
Obstructive Sleep Apnoea (OSA) is a common sleep disorder characterized by repetitive pauses in breathing due to upper airway obstruction. These interruptions lead to sleep fragmentation, hypoxia, and increased cardiovascular risk. Understanding physiological predictors of OSA severity is essential for developing effective screening and prevention strategies.
In this study, the arousal index (ai); representing the frequency of awakenings per hour, is used as the dependent variable. Four predictors are considered based on their known associations with OSA:
Body Mass Index (BMI) – a general indicator of body fat.
Neck Size – reflects airway obstruction potential.
Systolic Blood Pressure (SBP) – captures cardiovascular strain.
Age – accounts for physiological changes increasing OSA risk.
The study provides insights into how physiological and demographic factors correlate with sleep disturbances, helping guide further research and clinical screening in the context of OSA.
My goal here is to determine which of these variables significantly predict ai and to assess the overall performance and appropriateness of the regression model.
Find the data here.
3 Exploratory Analysis
3.1 Scatter-plot Matrix
Relationships between the response and predictors
aiappears to have a moderate positive correlation with sbp, neck_size, and age.bmi and neck_size appear to be highly correlated.
Among the predictors, bmi and neck size are highly correlated, which suggests that multicollinearity could be a concern in regression modeling. This correlation indicates that both variables are related to body composition.
3.2 Model Fitting and Interpretation
Fitting the linear regression model using ai as the response variable and the other variables as the predictor.
Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)
Coefficients:
(Intercept) bmi neck_size sbp age
-0.159406 -0.009852 0.040627 0.010218 0.008789
Significant predictors (p < 0.05) included neck size, sbp, and age, whereas bmi was not statistically significant.
3.2.1 Producing a 95% CI that quantifies the change in ai for each extra cm of neck size:
Confidence Interval at \(\alpha\) = 0.05 or 95%
2.5 % 97.5 %
neck_size 0.01248892 0.06876571
Therefore, for every 1 cm increase in neck size, the arousal index (ai) increases by between 0.012 and 0.068 on the log scale, with 95% confidence, assuming age and sbp are held constant.
This provides strong evidence that neck size is an important risk factor for obstructive sleep apnoea (OSA) severity, as measured by the frequency of arousal during sleep.
3.2.2 F-test For The Overall Regression
Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)
Residuals:
Min 1Q Median 3Q Max
-1.67136 -0.32269 0.01491 0.35778 1.47595
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.159406 0.518207 -0.308 0.75893
bmi -0.009852 0.011312 -0.871 0.38557
neck_size 0.040627 0.014208 2.859 0.00503 **
sbp 0.010218 0.003555 2.875 0.00481 **
age 0.008789 0.002964 2.965 0.00367 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5417 on 117 degrees of freedom
Multiple R-squared: 0.2471, Adjusted R-squared: 0.2213
F-statistic: 9.598 on 4 and 117 DF, p-value: 9.54e-07
neck_size, sbp and age have a significant impact on the response variable, arousal index (ai).
3.2.3 Fitting in the Multiple Regression Model
Model Parameters:
\(ai_i\) = arousal index for the i-th individual (log scale)
\(\beta_0\) = Intercept
\(\beta_1\) = bmi
\(\beta_2\) = neck_size
\(\beta_3\) = sbp
\(\beta_4\) = age
\(\varepsilon_i\) = random error
This model aims to assess the overall relationship between the arousal index and the set of physiological predictors: Body Mass Index (bmi), neck size, Systolic Blood Pressure (sbp), and age.
3.2.4 Hypothesis for the Overall ANOVA Test:
The NULL Hypothesis (\(H_0\)) states that none of the predictors have an effect on the response variable that is arousal index (ai).
- \(H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0\)
The Alternate Hypothesis states (\(H_1\)) that at least one of the predictors have an effect on the arousal index (ai).
- \(H_1: \beta_i \ne 0\)
3.2.5 ANOVA Table for the Overall Model
Analysis of Variance Table
Response: ai
Df Sum Sq Mean Sq F value Pr(>F)
bmi 1 1.725 1.7250 5.8777 0.0168638 *
neck_size 1 3.288 3.2881 11.2040 0.0010982 **
sbp 1 3.674 3.6739 12.5185 0.0005789 ***
age 1 2.580 2.5798 8.7904 0.0036707 **
Residuals 117 34.337 0.2935
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
value numdf dendf
9.597645 4.000000 117.000000
F-test for this Regression Model: F = (4,117) and \(p-value < 0.005\)
Since the p-value is well below 0.05, we reject the null hypothesis. This indicates that at least one of the predictors is significantly related to the response variable.
3.2.6 Null Distribution for the test statistic
Under the null hypothesis (\(H_0\)), the F-statistic follows an F-distribution with 4 and 117 degrees of freedom.
When \(H_0: \beta_1 = \beta_2 = \beta_3 = \beta_4 = 0\), then test statistic is F ~ F(4,117).
3.2.7 P-value
The corresponding P-value for this overall regression model is:
\[\text{p-value} = 9.54 \times 10^{-7}\]
(or \(0.000000954\))
P-value was previously calculated aforementioned above.
3.2.8 Findings
Statistical Conclusion:
As the p-value from the overall F-test is extremely small (\(9.54 \times 10^{-7}\)), that is below the significance level (\(\alpha = 0.05\)), we reject the null hypothesis that none of the predictors (bmi, neck_size, sbp, age) are related to the response variable.
Contextual Conclusion:
There is strong evidence that at least one of the predictors is significantly associated with the arousal index (ai). Thus, this overall regression model as a whole provides quite meaningful explanatory power for predicting ai.
In plain terms, the model isn’t random; there’s a clear relationship between OSA severity and the selected body measurements.
3.3 Model Validation & Appropriation
Checking residual vs fitted plots for linearity and constant variance
The residuals are generally evenly scattered around the horizontal line at 0, which supports the assumption of linearity. While there is no distinct shape or pattern there is a slight upward curve on the right.
Additionally, there are a few outliers present, at points 68, 69, and 79; however, they do not demonstrate excessive influence.
Checking the normality of residuals
The standardized qqplot shows that the residuals mostly follow the normal line, with very slight deviations at the tails. This implies that the normality assumptions are met, with no major concerns about non-normality.
3.4 \(R^2\) and It’s Significance
[1] 0.221317
Adjusted \(R^2\) = 0.2213. This is quite a low value, it means that in this model only about 22% of the variability is caused by the predictor variables, the other 78% is due to other factors not included or complete randomness.
This is not a good model, unreliable, and needs work.
4 Improving the Model
4.1 Checking for predictor significance
Call:
lm(formula = ai ~ bmi + neck_size + sbp + age, data = sleep)
Residuals:
Min 1Q Median 3Q Max
-1.67136 -0.32269 0.01491 0.35778 1.47595
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.159406 0.518207 -0.308 0.75893
bmi -0.009852 0.011312 -0.871 0.38557
neck_size 0.040627 0.014208 2.859 0.00503 **
sbp 0.010218 0.003555 2.875 0.00481 **
age 0.008789 0.002964 2.965 0.00367 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5417 on 117 degrees of freedom
Multiple R-squared: 0.2471, Adjusted R-squared: 0.2213
F-statistic: 9.598 on 4 and 117 DF, p-value: 9.54e-07
bmi doesn’t appear significantly impactful on linear model for ai and had also displayed strong correlation with neck_size prevoiusly aforementioned above. Therefore, I’ll be removing it and re-evaluate necessary assumptions.
Call:
lm(formula = ai ~ age + neck_size + sbp, data = sleep)
Residuals:
Min 1Q Median 3Q Max
-1.65415 -0.35334 0.04008 0.37534 1.45627
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.066166 0.506509 -0.131 0.89629
age 0.009007 0.002951 3.053 0.00280 **
neck_size 0.032630 0.010831 3.013 0.00317 **
sbp 0.009579 0.003475 2.757 0.00676 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5412 on 118 degrees of freedom
Multiple R-squared: 0.2422, Adjusted R-squared: 0.2229
F-statistic: 12.57 on 3 and 118 DF, p-value: 3.452e-07
Very small increase in the adjusted \(R^2\), from 0.221 to 0.222. P-value has dropped and is also still <0.05, implying that new model fit has not worsened. All predictors are significant.
QQnorm is quite linear, no significant deviations.
Residuals are quite evenly scattered around. No trend or patterns visible.
Residual vs fitted is also quite normal, no signs of a trend but a slight upwards deviation on the right.
QQplot is relatively linear.
It would be safe to say that the linearity assumptions are met.
Therefore, final model:
4.1.1 Comparing Model Fits
| Model | R² | Adjusted R² | Predictors | Comment |
|---|---|---|---|---|
| Original | 0.2471 | 0.2213 | BMI, Neck Size, SBP, Age | BMI not significant |
| Refined | 0.2422 | 0.2229 | Neck Size, SBP, Age | Slight improvement |
The \(R^2\) dropped marginally (from 0.2470586 to 0.2421771) because we removed a predictor. The adjusted \(R^2\) always accounts for the no.of predictors in the model, and only increases if new variables improve the model’s efficiency, the increase (from 0.221317 to 0.2229104) in adjusted \(R^2\) indicates that the new model is likely more accurate . bmi may not have contributed significantly to explaining ai in the older model, so its removal results in a more efficient model.
5 Discussion
The findings confirm that neck size, blood pressure, and age are important factors associated with OSA severity. BMI, although commonly linked to sleep apnoea, was not significant here once neck size was included, suggesting that neck circumference captures the effect of body mass more directly for airway obstruction.
Still, since the model only explains 22% of ai variation, other factors such as genetic traits, lifestyle habits, or anatomical structures likely play major roles.
6 Conclusion
This study shows that neck size, systolic blood pressure, and age are key predictors of OSA severity, measured by the arousal index.
Even though the model explains only a modest portion of the variability, it provides valuable insight into which physical traits are most strongly linked to disrupted sleep. Improving future models with additional lifestyle and physiological data could lead to better prediction and prevention strategies for OSA.